Picture for Yue Huang

Yue Huang

Converted, Not Equivalent: Benchmarking Codebase Conversion via Observational Equivalence

Add code
May 27, 2026
Viaarxiv icon

JobBench: Aligning Agent Work With Human Will

Add code
May 25, 2026
Viaarxiv icon

AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills

Add code
May 13, 2026
Viaarxiv icon

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

Add code
May 12, 2026
Viaarxiv icon

Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Add code
May 11, 2026
Viaarxiv icon

Why Search When You Can Transfer? Amortized Agentic Workflow Design from Structural Priors

Add code
Apr 27, 2026
Viaarxiv icon

AlphaContext: An Evolutionary Tree-based Psychometric Context Generator for Creativity Assessment

Add code
Apr 21, 2026
Viaarxiv icon

PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models

Add code
Apr 14, 2026
Viaarxiv icon

Bringing Clustering to MLL: Weakly-Supervised Clustering for Partial Multi-Label Learning

Add code
Apr 10, 2026
Viaarxiv icon

Feature-Label Modal Alignment for Robust Partial Multi-Label Learning

Add code
Apr 10, 2026
Viaarxiv icon